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Abstract 


This  thesis  continues  the  research  begun  by  Storm,  Bauer  and  Oxley  in  2003  into 
the  fusion  of  classifiers.  It  examines  the  fusion  of  up  to  three  correlated  classifiers  using 
three  different  fusion  techniques.  The  overall  objective  was  to  determine  the  optimal 
ensemble  of  classifiers  to  maximize  the  expected  classification  accuracy.  The  ISOC 
fusion  method  (Haspert,  2000),  the  ROC  “Within”  fusion  method  (Oxley  and  Bauer, 
2002)  and  a  Probabilistic  Neural  Network  were  the  three  fusion  techniques  employed  in 
these  set  of  experiments.  Performance  of  the  classifiers  and  the  fusion  methods  is 
measured  via  ROC  curves.  Two  possible  configurations  of  feature  correlations  were 
examined.  The  expected  true  positive  value  relative  to  a  prior  distribution  of  correlation 
levels  for  each  configuration  was  then  used  to  compare  the  classifier  and  the  fused 
classifiers  performance  and  thereby  allowing  for  the  selection  of  an  optimal  ensemble. 
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AN  INVESTIGATION  OF  THE  OPTIMAL 


SENSOR  ENSEMBLE  FOR  SENSOR  FUSION 

I.  Introduction 

Background 

Effective  Command  and  Control  (C2)  depends  in  large  part  on  the  ability  to 
accurately  identify  all  of  the  hostile,  friendly  and  neutral  entities  in  the  battlespace 
referred  to  as  Combat  Identification  (CID).  Accurate  CID  hinges  on  the  ability  to 
effectively  process  data  to  build  a  three-dimensional  picture  of  the  battlespace.  This  in 
turn  permits  real-time  application  of  tactical  options  so  weapons  can  be  employed  at 
optimal  ranges  against  the  most  critical  enemy  targets  (Peters  and  Ryan,  1998).  In  other 
words,  commanders  require  accurate  CID  to  obtain  a  situational  awareness  of  the 
battlespace  which  allows  them  to  effectively  prosecute  their  operations. 

Across  the  Department  of  Defense  (DoD)  reliable  CID  in  operations  has 
consistently  proven  to  be  an  elusive  capability.  Thirty-five  Americans  were  killed  and  72 
wounded  due  to  “friendly  fire”  or  fratricide  during  the  Gulf  War  (Report  to  Congress, 
1992).  Approximately  68  percent  of  these  incidents  appeared  to  be  the  result  of  target 
misidentification  and/or  coordination  problems  (Report  to  Congress,  1992).  Since  the 
Coalition  controlled  the  battlespace  in  every  aspect  of  the  war  these  casualties 
represented  a  need  for  better  situational  awareness  (i.e.,  identification)  of  forces  in  the 
battlespace.  Three  years  later,  two  F-15E  aircraft  shot  down  two  UH-60  Blackhawks 
over  Iraq  in  Operation  Provide  Comfort.  This  tragedy  also  illustrates  a  breakdown  in 
situational  awareness/combat  identification  in  that  the  F-15E  pilots  coordinated  with  an 
Airborne  Warning  and  Control  System  (AWACS)  aircraft  before  firing. 

Misidentification  continued  to  be  a  problem  even  through  the  recent  events  in  Operation 
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Iraqi  Freedom.  For  example,  on  25  Mareh  2003  a  Patriot  surfaee-to-air  battery  in  Iraq 
eame  under  mortar  fire.  The  erew  engaged  the  batteries  automatie  systems  and  took 
eover.  The  system’s  radar  misidentified  a  loeal  F-16  as  hostile  and  loeked-on  to  the 
aireraft.  The  F-16  responded  by  firing  a  High-speed  Anti-Radiation  Missile  (HARM)  at 
the  battery,  destroying  its  radar  dish  (Weisman,  2003).  More  eases  eould  be  mentioned 
but  these  serve  to  illustrate  that  misidentification  and  subsequent  fratrieide  is  a  very  real 
problem  for  the  U.S.  military.  In  faet,  Lt.  Gen.  Conway,  who  led  over  85,000  Marines 
into  Operation  Iraqi  Freedom,  has  said  that  the  amount  of  fratrieide  was  probably  his 
biggest  disappointment  of  the  war  (Conway,  2003). 

Combat  identifieation  can  be  divided  into  four  mission  areas:  air-to-air,  air-to- 
ground,  ground-to-air  and  ground-to-ground.  Each  of  these  four  mission  areas  has  their 
own  architecture.  There  is  no  overarching  architecture  for  CID  (GAO  Report,  2001). 
For  example,  U.S.  aircraft  often  use  Identification  Friend  or  Foe  (IFF)  to  identify  other 
aircraft.  Vehicles  on  the  ground;  however,  might  use  thermal  plates  or  thermal  tape  to 
identify  friendly  forces.  Not  only  are  the  sensors  in  each  of  these  architectures  different 
but  the  decision  makers  in  each  case  varies  as  well.  Frontline  soldiers  use  their  training 
and  understanding  of  the  Rule  Of  Engagement  (ROE)  to  make  friend  or  foe  decisions. 
The  air  forces,  on  the  other  hand,  usually  coordinate  with  an  air  operations  controller. 
Such  varied  environments  and  architectures  have  lent  themselves  only  to  partial 
solutions,  so  that  there  is  no  one  general  solution  for  CID  across  the  DoD  community. 

The  Joint  Combat  Identification  Advanced  Concept  Technology  Demonstration 
office  partitioned  target  determination  into  four  basic  CID  system  concepts.  A  majority 
of  CID  systems,  if  not  all,  fall  into  one  of  the  four  system  concepts. 

•  Some  systems  align  a  sensor  with  the  weapon  sight.  The  sensor  interrogates  the 
target.  A  reply  from  the  target  identifies  it  as  friendly,  otherwise  the  target  is 
unknown. 
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•  “Don’t  shoot  me”  systems  use  Global  Positioning  Systems  (GPS)  or  location 
systems.  A  weapon  system  sends  out  an  interrogation  in  all  directions  containing 
the  targeted  position.  Friendly  systems  in  the  area  return  a  “don’t  shoot  me” 
response. 

•  Situational  awareness  systems  receive  periodic  position  updates  from  friendly 
forces.  From  these  updates,  C2  systems  are  able  to  de-conflict  friendly  fire. 

•  Non-cooperative  target  recognition  systems  find  a  signature  in  acoustic  signals, 
thermal  and  electromagnetic  emissions  and  other  data  sources.  The  signature  is 
then  compared  to  a  database  to  determine  if  the  signature  is  indicative  of  a  hostile, 
friendly  or  neutral  target  (Garamone,  2003). 


Air  Force  CID  systems  predominately  use  situational  awareness  and  non- 
cooperative  target  recognition  concepts  when  identifying  air-to-air  and  air-to-ground 
targets.  A  few  examples  will  illustrate  the  diversity  of  applications  these  concepts  have. 
The  AW  ACS  uses  the  situational  awareness  concept  when  its  radar  tracks  friendly 
aircraft  through  its  airspace.  An  Unattended  Ground  Sensor  (UGS),  on  the  other  hand, 
uses  target  signatures  to  identify  vehicles  on  the  ground.  Some  systems  use  more  than 
one  concept  like  the  Joint  Surveillance  Target  Attack  Radar  System  (JSTARS).  It  tracks 
friendly  forces  while  at  the  same  time  using  target  recognition  to  identify  nearby  hostile 
forces.  Fighters  of  all  kinds  use  IFF  systems.  Finally,  Intelligence,  Reconnaissance  and 
Surveillance  (ISR)  platforms,  like  the  U-2,  use  a  variety  of  Electro-Optical  (EO),  Infrared 
(IR)  and  radar  sensors  among  others  on  a  single  platform  to  perform  non-cooperative 
target  recognition.  Such  an  array  of  sensors  on  so  many  aircraft  requires  a  focal  point  to 
fuse  the  sensor’s  data,  to  analyze  the  intelligence,  to  identify  contacts  as  hostile  or 
friendly,  to  form  a  situational  awareness  picture  and,  ultimately,  to  direct  air  operations. 
This  focal  point  is  called  an  Air  Operations  Center  (AOC). 

The  AOC  is  the  weapon  system  by  which  the  Joint  Eorces  Air  Component 
Command  (JEACC)  commands  and  controls  aerospace  forces  in  a  theater  of  operations 
(AOC  CONOPS,  2001).  Within  the  AOC  there  is  one  cycle,  or  process,  for  finding  and 
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prosecuting  targets.  That  cycle  is  Find,  Fix,  Track,  Target,  Engage  and  Assess 
(F2T2EA).  Briefly,  all  sensor  data,  or  selected  sensor  data,  is  sent  through  the  AOC  and 
potential  targets  are  found.  The  potential  target  is  then  identified  as  hostile  or  friendly 
and  located.  Surveillance  assets  are  then  used  to  track  targets  until  action  can  be  taken. 
Targeteers  in  the  AOC  determine  the  target’s  priority  per  JEACC  guidance  and  the 
appropriate  action.  If  targeteers  assign  a  weapons  platform  to  attack  the  target,  then  the 
weapons  system  engages  the  target.  Einally,  after  the  engagement  Battle  Damage 
Assessment  (BDA)  is  performed  on  the  target  to  determine  if  it  was  destroyed. 

Combat  identification  functions  occur  in  the  “Eix”  cycle  step.  In  this  step,  sensor 
data  has  already  arrived  at  the  AOC  indicating  that  there  is  an  unknown  contact.  In  some 
instances,  such  as  lEE,  the  sensor  will  give  decision  makers  a  positive  or  false 
identification  of  the  target.  In  other  cases,  such  as  U-2  imagery,  an  analyst  will  be 
required  to  make  a  determination.  In  either  case,  analysts  may  fuse  the  intelligence  from 
the  sensors  with  other  intelligence  sources  and  the  AOC’s  situational  awareness.  The 
analyst  may  also  use  automated  target  recognition  and  target  cueing  tools  to  help  them 
identify  targets.  After  the  initial  analysis,  an  analyst  can  declare  the  target  hostile, 
friendly,  or  unknown.  If  the  target  is  still  unknown  after  the  initial  analysis,  intelligence 
collection  managers  can  cross-cue  a  different  sensor  to  the  same  target  to  take  advantage 
of  complementary  sensors  and  increase  the  analyst’s  target  identification  confidence. 

The  result  is  that  the  intelligence  analyst,  the  decision  maker,  gives  the  commander  target 
identification. 

In  the  case  of  Time  Critical  Targeting  (TCT),  intelligence  analysts  are  required  to 
make  a  target  determination  in  minutes.  The  AOC  CONOPS  looks  to  information 
technologies  to  improve  analyst’s  ability  to  analyze  sensor  intelligence  and  confidently 
identify.  Eurthermore,  the  document  suggests  that  “information  technology  could 
provide  the  decision-making  tools,  decision  support  systems,  and  simulations  to  enable 
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commanders  to  make  better  and  quieker  decisions”  (AOC  CONOPS,  2001).  These 
deeision  tools,  deeision  support  systems  and  simulations  are  all  based  on  statistics  and 
probability. 

The  Air  Foree  has  tasked  Air  Combat  Command  (ACC)  to  study  various 
combinations  of  combat  identification  sensors  through  modeling,  simulation  and  analysis. 
Among  the  various  efforts,  ACC  has  supported  basie  researeh  at  the  Air  Foree  Institute  of 
Teehnology  (AFIT)  in  sensor  fusion.  Last  year,  Capt  Storm  performed  researeh  for  ACC 
entitled  “An  Investigation  of  the  Effeets  of  Correlation  in  Sensor  Fusion”  (Storm,  2003). 
Her  researeh  encompassed  three  fusion  methods,  the  Identification  Operating  System 
Characteristic  (ISOC)  method,  the  Reeeiver  Operating  Charaeteristie  (ROC)  method  and 
Neural  Fusion;  an  example  of  whieh  is  fusion  via  Probabilistic  Neural  Networks  (PNN). 
This  thesis  builds  upon  her  researeh. 

Problem  Statement 

Deeision  makers,  or  analysts,  are  required  to  declare  a  target  as  friendly  or  hostile 
using  the  available  sensor  data.  If  a  determination  ean  not  be  made,  the  deeision  makers 
must  deeide  whieh  additional  sensor(s)  to  task  to  improve  their  probability  of  making  a 
eorreet  identifieation.  Additional  sensor  taskings  are  usually  based  on  the  deeision 
maker’s  prior  experience  with  an  expectation  of  improving  their  probability  of  eorreet 
target  identification.  However,  decision  makers  can  not  prove  which  sensor,  or  sensor 
ensemble,  has  the  best  probability  of  correetly  identifying  the  target  nor  quantify  by  how 
mueh  that  probability  will  improve. 
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Research  Objective 

This  research  seeks  to  determine  the  optimal  sensor  ensemble  and  fusion 
technique  combination  across  differing  prior  correlation  distributions.  To  support  this 
objective,  the  research  will  develop  both  an  implementation  methodology  to  perform 
three-classifier  fusion  and  a  reasonable  optimality  criterion  to  measure  ensemble 
performance.  Secondly,  an  empirical  study  will  be  conducted  using  the  proposed 
methodology.  Lastly,  the  research  will  test  the  viability  of  creating  posterior  probabilities 
from  a  modified  radial  basis  function  neural  network. 

Assumptions/Limitations. 

Sensor  data  was  not  readily  available  for  this  effort;  hence,  sensors’  feature  data 
was  simulated  using  a  Matlab  program. 

Terminology. 

During  the  course  of  the  research  it  became  apparent  that  many  terms  in  the 
operational  world  are  synonymous  with  terms  in  the  statistical  world.  Often  writers  used 
these  terms  interchangeably.  Encapsulated  here  are  some  synonymous  terms  to  help  the 
reader.  Sensors  create  feature  data.  A  set  of  feature  data  comes  from  each  sensor  and 
because  each  data  set  is  associated  with  one  sensor  sometimes  the  terms  are  used 
interchangeably.  Automatic  Target  Recognition  (ATR)  software  or  human  operators 
examine  the  feature  data  looking  for  targets.  In  statistical  terms,  the  software  and 
operators  are  considered  classifiers  of  the  data.  Here  again  the  operators  will  identify  the 
targets,  while  statisticians  classify  exemplars.  A  hostile  target  misidentified  as  a  friend  is 
called  a  “leaker”  in  operations.  Misidentifying  a  friend  as  a  hostile  does  not  have  a 
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specific  term  but  leads  to  fratricide.  Statistically,  a  hostile  classified  as  hostile  is  called  a 
true  positive.  A  friendly  classified  as  a  hostile  is  called  a  false  positive. 

Implications. 

If  expected  sensor  identification  accuracies  can  be  defined  and  quantified, 
decision  makers  will  be  able  to  make  more  informed  decisions  regarding  sensors 
taskings.  This  will  lead  efficient  use  of  high-demand/low-density  ISR  assets.  More 
accurate,  timely  decisions  can  be  made  regarding  target  identification. 
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II. 


Literature  Review 


Introduction 

This  chapter  reviews  literature  relevant  to  this  researeh.  It  begins  with  a  brief 
overview  of  the  sensor  fusion  proeess  to  provide  the  eontext  in  whieh  the  literature  will 
be  applied.  The  rest  of  the  ehapter  then  discusses  the  literature  in  the  order  in  whieh  it 
pertains  to  the  fusion  proeess.  The  fusion  proeess  begins  with  the  data.  The  data’s 
statistical  dependence  most  affeets  this  researeh,  so  a  discussion  of  the  statistieal 
dependenee  is  ineluded.  Next,  a  review  of  posterior  probabilities  and  neural  networks 
leads  to  a  discussion  of  the  modifieations  made  to  the  radial  basis  funetion  neural 
network  classifier.  The  fusion  methods  follow  the  elassifiers  in  the  fusion  proeess,  so  a 
deseription  of  the  three  fusion  methods,  the  ISOC,  ROC  and  PNN,  eomes  next.  Lastly,  a 
review  of  methods  used  to  measure  elassifier  and/or  fusion  performanee  eompletes  the 
literature  review. 

Fusion  Process  Overview 

The  objeetive  of  the  fusion  methods  is  to  yield  a  better  classification  than  the 
single  best  elassifier  alone.  Training  data  is  used  to  train  the  individual  elassifiers.  The 
classifiers  then  classify  two  other  data  sets  ealled,  fusion  training  and  testing.  The  result 
of  this  elassifieation  is  two  sets  of  posterior  probabilities:  fusion  training  posterior 
probabilities  and  testing  posterior  probabilities.  The  two  sets  of  posterior  probabilities 
are  then  sent  to  the  fusion  methods.  The  fusion  methods  train  with  the  fusion  training 
posterior  probabilities  and  are  tested  with  the  testing  posterior  probabilities.  All 
experimental  designs  used  in  this  researeh  are  derived  from  this  basie  eonstruet. 
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Figure  1:  Fusion  Overview 


Statistical  Independence 

Both  the  ISOC  and  the  ROC  methods  assume  that  the  data  coming  from  the 
sensors  are  independent  (Haspert,  2000;  Oxley  and  Bauer,  2002).  Statistically 
independent  data  provides  more  new  information  than  dependent  data.  A  fusion  method 
should  classify  test  data  better  given  more  training  data.  Some  have  attempted  to  find  a 
fusion  method  rule  to  apply  to  statistically  dependent  data.  One  attempt  used  two 
classifiers  on  a  bivariate  Gaussian  surface  the  simplest  fusion  case  possible  (Willet, 

2000).  Three  fusion  rules  were  applied  logical  “AND”,  “OR”  and  “XOR”  to  three 
partitions  of  a  Gaussian  mean-shifted  space.  The  “AND”  rule  could  always  classify  data 
with  one  threshold  in  one  partition,  never  classify  the  data  with  one  threshold  in  the 
second  partition  and  no  consistent  classification  could  be  determined  in  the  third  partition 
(Willet,  2000).  Basically,  even  the  simplest  problem  failed  to  yield  a  consistent  fusion 


9 


rule  for  correlated  data.  With  that  in  mind,  this  research  seeks  to  determine  how  resilient 
ISOC  and  ROC  fusion  rules  are  to  correlated  data. 

Discriminant  Classifiers 

Given  a  new  exemplar  and  the  probability  density  functions  of  multiple  classes, 
discriminant  analysis  compares  the  ratio  of  the  probability  density  functions  at  the  new 
exemplar.  The  ratio  is  then  used  to  determine  which  class  the  exemplar  belongs  to.  In 
this  particular  experiment,  the  classes  have  a  bivariate  normal  distribution  so  that  the 
discriminant  calculates  the  ratio  of  the  class  one  probability  density  function  to  class  two 
probability  density  function.  The  following  equation  represents  the  multivariate 
probability  density  function: 

where  Xq  represents  a  new  exemplar,  //.  represents  the  mean  of  class  i  andS  represents 

the  pooled  covariance  of  the  classes.  For  this  research  both  classes  are  normally 
distributed  and  have  equal  covariance  matrices.  Bayes’  rule  is  applied  to  the  disciminant 
analysis  to  produce  posterior  probabilities.  The  following  equation  then  produces  the 
posterior  probability  for  each  class. 

exp[-(l/2)=^(Xo  -//^)] 

I  ^o)  =  ^ - 

£exp[-(l/2)*(Xo  -//,)] 

i=\ 

The  quadratic  discriminant  operates  exactly  like  the  linear  discriminant  except  that  it  does 
not  assume  equal  covariance  matrices  among  the  classes.  By  using  normal  distributions 


m,)=- 


1 


(p/2)|v|(l/2) 


exp 


(-] 

v2y 
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and  applying  Bayes’  rule,  we  arrive  at  different  equation  whieh  produces  the  posterior 
probability  for  each  class. 

exp[(-l/2)=^ln|s  J-(l/2)n^o  -//,)] 

I  ^o)  =  ^ ^ ^ - 

£  exp[(-l  /  2)  ln|S,  I  -  (1  /  2)  *  (^0  -  A  )’2:r' (^0  - 

1=1 

where  represents  the  covariance  matrix  of  class  j.  In  this  way,  posterior  probabilities 

were  created  for  the  linear  and  quadratic  discriminant  classifiers. 

Modified  Radial  Basis  Function  Network 

The  standard  radial  basis  function  network  classifies  new  exemplars  based  on  the 
sum  of  their  weighted  distances  to  exemplars  of  known  classes.  This  method  outputs 
weights  for  each  class  given  a  new  exemplar.  It  does  not  produce  posterior  probabilities. 
Since  the  fusion  methods  require  posterior  probabilities,  the  standard  radial  basis  function 
was  modified. 

It  has  been  shown  that  the  outputs  of  a  multilayer  perceptron  network 
approximate  the  a  posterior  probability  function  of  the  classes  for  any  number  of  layers 
and  any  type  of  activation  functions  (Ruck,  et  ah,  1990).  The  network’s  backpropagation 
achieves  this  by  minimizing  its  mean  squared-error  approximation  to  the  Bayes  optimal 
discriminant  function  (Ruck,  et  ah,  1990).  The  mean  squared-error  approximation  is 
represented  by  the  following  equation; 

(w)  =  J [F{x,  w)  -  go  {x)f  p{x)dx 

X 

where  F{x,  w)  represents  the  perceptron  output  for  new  exemplar  x  and  weights  w, 
go(x)  represents  the  Bayes  optimal  discriminant  function  and  />(x)  represents  the  density 
function.  Since  a  multi-layer  perceptron  network  approximates  the  Bayes  optimal 
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discriminant,  it  appears  reasonable  that  a  network  utilizing  a  single  pereeptron  output 
node  and  a  non-perceptron  hidden  layer  would  exhibit  some  of  the  same  qualities.  So  the 
modified  radial  basis  function  outputs  are  treated  as  posterior  probabilities  in  this 
researeh. 

The  modified  radial  basis  function  architecture  consists  of  a  hidden  layer  of  radial 
basis  neurons  and  a  single  log-Sigmoid  output  node.  The  network  uses  n  number  of 
training  exemplars  each  with  m  number  of  features  to  establish  the  weight  vector  for  each 
neuron  in  the  hidden  layer,  shown  in  Figure  2.  The  bias  into  hidden  layer  is  established 
empirically  by  varying  the  neuron’s  spread.  The  relationship  between  spread  and  bias 
was  as  follows: 

bias  =  0.8326/  spread 

Both  the  weight  veetor  and  the  bias  are  used  in  the  aetivation  funetion  for  eaeh  radial 
basis  neuron  is  as  follows: 

hi  =  e 

where  w-  p  represents  the  distanee  between  the  neuron’s  weight  veetor  and  the  new 
exemplar’s  feature  veetor  and  b  represents  the  neuron’s  bias. 

The  output  node  utilizes  a  Log-Sigmoid  transfer  function.  The  transfer  function 
requires  weighted  inputs  and  a  bias  to  allow  for  training,  so  weights  and  bias  are 
initialized  the  Nguyen- Widrow  initialization  method.  The  weighted  activations  and  the 
bias  are  then  summed  at  the  node  and  sent  through  the  Log-Sigmoid  transfer  funetion. 
The  function  takes  their  sum  and  maps  the  output  to  a  [0,1]  range. 

\ogsig{n)  =  1/(1  -f 
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The  weights  and  bias  are  adjusted  using  updates  from  a  gradient  deseent  momentum 
method  and  an  adaptive  learning  rate.  The  output  yields  aetivations  for  elass  1  only. 
Class  0  would  be  its  eomplement.  Figure  2  illustrates  the  modified  radial  basis  funetion 
arehiteeture. 


INPUTS 


HIDDEN  LAYER  OUTPUT 


Figure  2:  Modified  Radial  Basis  Funetion  Network  Arehiteeture 


General  Regression  Neural  Networks 

General  Regression  Neural  Networks  subsume  all  other  radial  basis  funetions 
(Wasserman,  1993).  In  fact,  the  GRNN  network  topology  is  identical  to  a  normalized 
radial  basis  function  network  (Wasserman,  1993)  as  shown  in  Figure  3.  The  GRNN 
assigns  the  target  values  as  the  weights.  Because  zero  and  one  are  used  as  the  target 
values  (class  values),  this  has  the  effect  of  separating  the  two  classes.  The  GRNN  then 
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ignores  any  inputs  from  other  elasses  until  final  probability  ealeulation.  The  result  is  that 
the  GRNN  approaehes  an  optimal  estimator  in  the  mean-squared-error  sense  (German, 
Bienenstoek  and  Doursat,  1992).  Further  researeh  by  Riehard  and  Lippman  in  1991  has 
shown  that  if  a  elassifier  is  optimal  in  the  mean-squared-error  sense  then  given  enough 
data  it  will  approaeh  a  Bayes  optimal  elassifier.  Finally,  if  the  elassifier  approaehes  the 
Bayes  optimal  elassifier  then  its  output  will  very  elosely  approximate  the  posterior 
probabilities.  For  purposes  of  this  researeh,  the  posterior  probabilities  are  desirable  for 
fusion. 


HIDDEN  OUTPUT 

INPUTS  LAYER  LAYER  OUTPUTS 


Figure  3;  General  Regression  Neural  Network 
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Identification  System  Operating  Characteristic  Fusion  Method 

The  Identification  System  Operating  Characteristic  (ISOC)  method  provides  an 
optimal  sensor  fusion  rule  to  identify  new  targets.  The  process  to  identify  this  optimal 
rule  incorporates  the  ability  to  adapt  the  rule  to  the  current  environment.  Briefly,  the 
method  develops  all  possible  boolean  sensor  fusion  rules  from  a  training  data  set,  applies 
the  costs  and  probability  of  misidentification  to  the  rules  and  selects  the  optimal  fusion 
rule  based  on  the  minimum  cost.  If  the  targets  encountered  change,  the  training  data  set 
and,  hence,  the  set  of  possible  sensor  fusion  rules  will  change.  Likewise,  if  the  cost 
and/or  probability  of  misidentification  changes  then  the  set  of  all  the  possible  rules 
remains  the  same  but  the  optimal  rule  will  change.  This  leads  to  a  quantifiable  adaptation 
of  the  sensor  fusion  rule  to  the  anticipated  targets  and  operating  environments.  Current 
adaptation  of  sensor  fusion  identification  rules  usually  involves  some  subjective  and 
often  intuitive  judgment  or  declared  policy  about  the  relative  reliability  or  priority  of 
different  identification  sensors  and  procedures,  the  consequences  of  making  identification 
errors,  the  nature  of  the  threat  environment  (Ralston,  1998).  The  ISOC  approach 
quantifies  the  key  subjective  factors  to  produce  the  optimal  sensor  fusion  rule  leading  to 
the  lowest  total  expected  costs.  Figure  4  illustrates  the  ISOC  process. 
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a.  Sensor  Performance  Matrices 

The  classification  performance  of  each  sensor  can  be  expressed  in  a  performance 
matrix  similar  to  Table  1.  Table  1  shows  an  example  of  a  two  class,  two  output  state 
example.  The  matrix  number  of  true  classes  and  output  states  can  be  expanded  if  needed. 


Table  1;  Sensor  Performance  Matrix 


Output 

State 

True  Class 

H 

F 

“H” 

P(“H”|H) 

P(“H”|F) 

“F” 

P(“F”|H) 

P(“F”|F) 

b.  Combat  Identification  System  States 

To  develop  the  Combat  Identification  System  (CIS)  states,  the  system  of  sensors 
must  utilize  two  separate  indexing  schemes,  one  for  the  sensors  and  one  for  each  sensor’s 
output  states.  The  sensors’  index  scheme  will  be  1  <  i  <  Ns,  where  Ns  represents  the 
total  number  of  sensors  and  i  represents  a  particular  sensor  in  that  system. 

Table  2:  Sensor  Performance  Matrix  with  Indices 


Sensor  (i) 

Friend 

Hostile 

output  state  (li) 

p(li|F) 

P(li|H) 

• 

• 

• 

output  state  (kj) 

p(ki|F) 

P(ki|H) 

• 

• 

• 

output  state  (ui) 

p(nilF) 

p(ni|H) 

The  second  index  scheme  for  the  individual  sensor’s  output  states  will  be  1<  ki  <  Ui, 
where  Ui  represents  the  total  number  of  sensor  output  states  for  the  i**'  sensor  and  ki 
represents  the  specific  output  state  of  the  i*  sensor.  Table  2  illustrates  the  indexing 
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schemes  for  the  i**'  sensor  given  two  target  types.  The  number  of  output  states  may  be 
different  for  eaeh  sensor.  However,  the  number  of  target  types  should  remain  the  same 
aeross  all  sensors. 

The  Combat  Identifieation  System  (CIS)  is  the  system  of  sensors  used  to  identify 
a  partieular  target.  So,  one  eombination  of  the  sensors’  output  states  defines  one  CIS 
state  (i.e.,  eonfiguration).  Under  this  definition,  eaeh  sensor  ean  only  assume  one  output 
state  for  any  given  CIS  state.  Let  eaeh  CIS  state  be  designated  as  Sj  and  define  j  as  the 
index  that  runs  over  all  N  states  of  the  system,  then  1  <  j  <  N  (Ralston,  1998).  It  will 
form  a  veetor  where  Sj  =  (si-”,  Si^,  ...,S2\  . . .,  SnsO-  So  that  Sf”  s  sensor  output  state  of  the  i* 
sensor  in  the  j*''  CIS  state  as  illustrated  in  Table  3. 

Table  3:  CIS  States  (Sensors’  Output  State  Combinations) 


j 

Sj 

1 

(Si  ,  S2  5  Si  ,  ...  ,  SNg  ) 

j 

(Si-",  Si-",  ...,  sJ,  ...  ,  Sns-") 

N 

(Si^  S2^...,S,^...,Sns^) 

There  will  be  N  distinet  eonligurations  of  the  overall  CIS  given  by  (Ralston,  1998) 

N  =  Y\n, 

i=l 

Assuming  the  sensors  are  independent,  the  probability  of  eaeh  CIS  state  given  a 
target  type  ean  be  found  by  multiplying  the  probabilities  of  eaeh  sensor’s  output  state 
given  the  same  target  type  (T): 
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c.  Identification  Fusion  Rules 

Some  sensors’  output  states  within  any  particular  CIS  state  will  inevitably  yield 
conflicting  identifications.  The  identification  fusion  rule  must  resolve  all  possible 
conflicting  indications  from  two  or  more  of  the  individual  sensors,  specifically  whether 
or  not  to  declare  a  target  “hostile”  and  hence  engageable  for  each  of  the  N  states  of  the 
system  (Ralston,  1998). 

Each  fusion  rule  can  be  expressed  as  vector  R  =  (ri,  r2,  ...  ,rj,  . . .  ,  rN),  where  j 
represents  the  CIS  state  index  and  rj  e  {0,1}  represents  either  the  inclusion  (rj  =  1)  or 
exclusion  (rj  =  0)  of  a  particular  CIS  state  in  the  fusion  rule.  The  total  number  of  distinct 
possible  fusion  rules  is  2^  (Ralston,  1998).  The  probability  that  a  particular  fusion  rule 
will  correctly  identify  different  target  types  can  be  found  by  multiplying  the  rule  by  the 
CIS  states’  conditional  probabilities.  The  following  equation  (and  an  alternate  form) 
holds,  where  T’(A  |  H)  represents  the  probability  of  classifying  a  target  as  “hostile”  given 
that  it  is  truly  hostile. 

P{h\H)  =  f^P{SfH)-R{j) 

j=i 

N  f  ^ 

P{h  I H)  = 

./=i  V  '=1  y 

Conversely,  the  following  probability  represents  probability  of  classifying  a  friendly  as  a 
“hostile.” 
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N 

P(h\F)  =  Y,P(S^\F)-R{j) 

7=1 

The  conditional  probabilities  for  each  target  type  (i.e.,  Hostile,  Friend,  etc.)  are  found  for 
each  fusion  rule.  The  problem  is  to  choose  the  fusion  rule  R  to  maximize  declaring  a 
hostile  as  a  hostile  while  minimizing  declaring  a  friend  as  a  hostile  (Ralston,  1998). 
Unfortunately,  there  are  too  many  fusion  rules  to  test  them  all.  Figure  5  illustrates  the 
number  of  possible  fusion  rules  resulting  from  nine  sensors,  which  are  2^  or  512  fusion 
rules. 

1.0 

0.8 

0.6 

0.4 

0.2 

0 

0  0.2  0.4  0.6  0.8  1.0 

Probability  (Hostile  |  Friend) 

Figure  5;  Possible  Fusion  Rules  (Haspert,  2000) 

At  the  beginning  two  fusion  rules  are  immediately  obvious,  “never  declare 
hostile”  and  “always  declare  hostile”.  Let  R(j)  =  rj,  that  is  R(j)  is  the  j*'’  component  of  R. 
The  “never  declare  hostile”  rule  means  that  R(j)  =  0  for  all  j  and  is  the  most  conservative 
rule  (Ralston,  1998).  The  next  most  conservative  rule  is  to  engage  in  the  single  state)  for 
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which  the  likelihood  ratio  P(j|H)/P(j|F)  is  largest  (Ralston,  1998).  By  repeating  this 
proeess,  we  ereate  sueeessively  less  eonservative  rules  of  engagements  until  the  “always 
engage”  rule,  R(j)  =  1  for  all  j,  is  reaehed  (Ralston,  1998).  In  other  words,  the  likelihood 
ratios  P(j|H)/P(j|F)  are  ordered.  The  plot  of  the  eumulative  probabilities  of  P(j|H)  vs 
P(j|F)  aeeording  to  the  likelihood  ratio  order  yields  an  Identifieation  System  Operating 
Charaeteristie  (ISOC).  As  might  be  expeeted,  eaeh  point  on  the  ISOC  is  a  eomplete 
identifieation  fusion  rule  (Ralston,  1998).  However,  no  point  (i.e.,  fusion  rule)  appears 
better  than  any  other  to  serve  as  the  best  operating  point.  To  determine  that,  requires 
additional  information  about  the  antieipated  ratio  of  eneountering  true-friends  and  true- 
hostiles  in  the  theater  of  operations  and  about  the  eosts  of  making  identifieation  errors  of 
different  kinds  (Ralston,  1998). 

d.  Cost  of  Identification  Errors 

The  eosts  of  misidentifieation  re  fleet  a  trade-off  between  the  relative 
undesirability  of  allowing  enemy  leakers  versus  ineurring  fratrieide  of  friendly  platforms 
(Haspert,  2000).  This  trade-off  uses  a  cost  function  to  select  an  ISOC  operating  point 
(i.e.,  fusion  rule)  that  minimizes  eost.  The  eost  funetion  is  as  follows: 

Ct  =  Cfn  *  Ph  *  PpN  +  Cfp  *  Pf  *  Pfp 
where  Cj  =  expeeted  eost  of  misidentifieation 

Cfn  =  cost  of  not  identifying  hostile  as  hostile  (e.g.,  a  potential  leaker) 

Cfp  =  cost  of  declaring  a  friend  as  hostile  (e.g.,  fraetrieide) 

Ph  =  a  priori  probability  of  hostile 

Pf  =  a  priori  probability  of  friend 

PpN  =  probability  hostile  not  deelared  hostile 
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Pfp  =  probability  friend  declared  hostile 
where  Pfn  and  Pfp  can  also  be  written  as 
PpN  =  1  -  Ptp 
Ptp  =  PG  I  H) 

Pfp  =  PG  I  F) 

The  a  priori  probability  of  hostile  and  friendly  are  proportional  to  the  relative  number  of 
hostile  targets,  Nh.  Hence  Ph~Nh  and  Pf~Nf  (Haspert,  2000).  The  Command  authority 
must  make  a  subjective  determination  regarding  the  cost  figures.  The  total  cost  of 
misidentification  is  calculated  for  each  ISOC  operating  point.  The  operating  point  with 
the  lowest  total  cost  is  determined  to  be  the  optimal  operating  point.  The  fusion  rule 
associated  with  that  operating  point  is  declared  to  be  the  optimal  sensor  fusion 
identification  rule. 

Receiver  Operating  Characteristic  Fusion  Method 

Whereas  the  ISOC  method  finds  an  optimal  rule  to  fuse  two  or  more  classifiers, 
the  Receiver  Operating  Curve  (ROC)  model  finds  the  optimal  thresholds  needed  in  the 
individual  classifiers  to  maintain  optimal  fusion  performance  for  a  fixed  fusion  rule 
(Storm,  2002).  The  ROC  model  discussed  in  this  paper  can  also  be  called  a  ROC 
“within”  model  because  two  classifiers  (sensors)  are  applied  to  the  same  feature  set 
(Oxley  and  Bauer,  2002).  The  classifiers  map  the  feature  set  into  two  different  label  sets. 
These  label  sets  are  then  combined  or  fused  via  optimal  thresholds  and  the  logical  “or” 
rule  into  a  single  system  label  set.  Figure  6  illustrates  the  ROC  “within”  fusion  process. 
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Figure  6;  ROC  “Within”  Fusion  Process 
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e.  Classifier  Parameters 

Let  two  classifiers  Ae  and  where  6*  e  0  and  ^  e  O  are  the  two  parameter  sets, 

act  upon  a  set  of  feature  vectors  (i.e.,  exemplars)  and  label  each  set  of  features  according 
to  a  label  set.  In  this  case,  the  label  set  L  =  {  Lh  ,  Lf),  where  Lh  means  hostile  label  and  Lf 
means  friend  label.  Further,  let  X  be  the  complete  set  of  feature  vectors  so  that  Xh 
represents  the  true  set  of  feature  vectors  representing  the  hostile  class  and  Xf  represent  the 
true  set  of  feature  vectors  representing  the  friend  class.  The  normal  definitions  of  true 
positive  (TP),  false  positive  (FP),  true  negative  (TN)  and  false  negative  (FN)  are 
illustrated  in  Table  4.  The  probabilities  of  these  conditions  for  each  classifier  are 
determined. 


Table  4:  Class  Definitions 


True  Class 

Classified  as: 

H 

F 

“H” 

TP 

FP 

“P” 

FN 

TN 

Let  P^TP  represent  the  probability  of  classifier  Ae  correctly  labeling  a  hostile  exemplar, 

true  positive.  The  following  defines  P\p  as  well  as  the  other  possible  probabilities. 

P\p  =  Pr  (A0(x)  e  Lh  I  X  eXh) 
pV=  Pr  (Ae(x)  e  Lh  |  x  eXf) 

P\n  =  Pr  (A0(x)  e  Lf  I  X  eXf) 

P\n  =  Pr  (A0(x)  e  Lf  I  X  eXh). 

The  definitions  for  B,|,  are  similar  (Clutz,  2002).  This  concept  along  with  a  varied 
threshold  ( 6* )  generates  a  ROC  curve.  One  such  threshold  could  be  the  probability  level 
that  determines  if  a  set  of  features  is  hostile  or  friendly.  For  example,  if  the  threshold  is 
0.5  and  the  set  of  features  has  a  0.6  probability  of  being  hostile,  then  the  set  of  features  is 
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declared  hostile.  As  the  threshold  varies  from  zero  to  one,  the  probability  of  TP  and  FP 
change.  The  plot  of  the  FP  vs  TP  as  the  threshold  changes  defines  the  ROC  curve,  so  that 
each  point  on  the  curve  represents  a  threshold. 

Once  the  hostile  and  friendly  probabilities  are  determined  for  each  classifier  they 
will  be  combined  for  all  possible  label  sets.  A  conditional  probability  table  for  classifiers 
Ae  and  B,],  is  given  in  Table  5. 


Table  5:  Conditional  Probabilities  for  Two  Classifiers 


(Ae,B<|,)  Reports  as: 

“H,  H” 

“H,  F” 

“F,  H” 

“F,  F” 

True 

State 

Friend 

pA  pB 

r  FP"  FP 

P\pP^TN 

pA  pB 

r  tn^  FP 

pA  pB 
^  Tm  TN 

Hostile 

pA  pB 

“  TPf  TP 

P^TpP^FN 

pA  pB 
r  TP 

pA  pB 

^  Fm  FN 

/  Classifier  Fusion 

Allow  Ce,,|,  to  be  defined  as  the  concatenation  of  classifiers  Ae  and  B,],.  As  a  result 
of  the  concatenation,  Co,,],  yields  a  concatenated  label  from  two  labels  h  and  h.  Some  rule 
or  method  will  be  required  to  reconcile  the  two  labels  should  there  be  any  conflicts.  The 
conditional  probabilities  in  Table  5  are  the  starting  point  to  resolving  that  conflict.  The 
table  can  also  be  titled  as  the  “Conditional  Probabilities  for  Ce,^”  as  it  already 
concatenates  the  two  classifiers.  Using  the  logical  “or”  rule  to  label  hostile,  Ce,,],  will  only 
be  declared  friendly  when  both  classifiers  are  friendly.  The  general  form  of  this  equation 
is  Ptp  =  1  -  PpN-  When  applied  to  this  case, 

P^TP  =  1  -  P^FN 

=  1  -  (P\n)  (P^fn) 

=  1-(1-P\p)(l-P\p) 
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-  1  -  (1  -  P\p  -  P^TP  +  (P\p)(P^Tp)) 

=  1  -  1  +  P\p  +  P^'tp  -  (P\p)(P\p) 

P^TP  =  P^TP  +  P^TP  -  (P\p)(P^Tp) 

Applying  the  same  logie  to  the  hostile  label  using  the  logieal  “or”  rule,  Ce,,|,  will  be 
labeled  hostile  if  both  or  either  elassifiers  Ae  and  B<|,  are  hostile.  Sinee  Pfp  =  1  -  Ptn,  then 
P*"fp  =  1  ~  P*^TN  —  ^  —  (P^tn)(P*^tn)-  This  results  in  P^fp  ~  P^fp  +  P^fp  ~  (P^fp)(P^fp)- 
The  maximum  P^tp  assoeiated  with  the  eaeh  P^fp  value  must  now  be  found  to 
develop  an  optimal  ROC  eurve.  Let  r  represent  P  fp,  p  represent  P  fp  and  q  represent 
P^FP.  So  that  pSp  =  P\p  +  P^FP  -  (P\p)(P^Fp)  can  be  re-stated  2Lsr=  p  +  q  -  (p){q). 
Solving  for  q  yields  Q(p)  =  {r-p)l{\  - p).  Now  let  r  vary  aeross  all  false  positive  values, 
re  [0,1].  For  every  r  value,  let  p  vary  so  that pe  [0,r].  Then  for  every  p  value  ealeulate  a 
eorresponding  q  value  from  the  equation.  The  result  is  a  (p,  q)  veetor  for  every  r  value. 

pi 

We  seek  the  optimal  pair  (p,q)  that  maximizes  the  P  tp  given  by: 

P*"tp  =  P\p  +  P^TP  -  (P\p)(P^Tp) 

whieh  beeomes: 

P^TP  (r)  =  Max\f,  ip)  +  f,  (Qip))  -  f,  {p)f,  {Q{p)] 

pe[0,l] 

The  ROC  eurve  of  elassifier  A  is  a  funetion  and  yields  a  true  positive  value  (i.e.,  fj  (p)  ) 
for  every  false  positive  value  p.  Likewise,  the  ROC  eurve  of  elassifier  B  is  a  funetion 
and  yields  a  true  positive  value  (i.e.,  (q)  )  for  every  false  positive  value  q.  Sinee  the 

value  of  q  is  derived  from  the p  value,  fg{q)  is  seen  as  the  eomposition  fg{Q{p))  in  the 
equation.  In  this  way,  eaeh  set  of  (p,q)  values  generates  a  unique  P  tp  value,  therefore 
we  get  a  funetion,  (r)  .  The  pair  maximizing  the  equation  are  denoted  as  p*  and  q*  and 
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the  associated  point  is  denoted  as(r*,/p(r*))  .  As  r  varies  over  its  range,  a  complete  set 
of  fusion  points  are  generated. 

This  process  may  also  be  described  in  terms  of  thresholds.  Briefly,  every  point  on 
a  ROC  curve  represents  a  true  positive  and  false  positive  pair.  These  declaration  pairs  are 
generated  by  comparing  the  original  classifier  posterior  probabilities  to  different 
thresholds  values.  So  each  declaration  pair  is  associated  with  a  particular  threshold. 
Probabilistic  Neural  Network  Method 

The  Probabilistic  Neural  Network  (PNN)  can  be  considered  a  deterministic 
network  that  approaches  a  Bayesian  optimality  given  a  large  training  data  set 
(Wasserman,  1993).  Some  other  advantages  include  the  network’s  instantaneous 
training  and  its  robustness  to  noise  (Wasserman,  1993).  Its  main  drawback  is  that  the 
hidden  layer  is  proportional  to  the  number  training  exemplars.  So  the  hidden  layer’s 
activation  calculations  may  become  excessively  large. 

INPUT  LAYER  HIDDEN  LAYER  SUMMATION  OUTPUT 


New 

Exemplar 


Figure  7:  Probabilistic  Neural  Network  Architecture 
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In  this  implementation,  as  shown  in  Figure  7,  the  PNN  reeeives  fusion  training 
posterior  probabilities  to  establish  the  network  and  testing  posterior  probabilities  to  test 
its  elassifieation  abilities.  The  method  deseribed  here  assumes  that  both  the  training  and 
testing  posterior  probabilities  are  normalized.  This  simplifies  the  Euelidean  distanee 
between  the  mean  of  radbas  neurons  and  the  new  exemplar.  Given  a  new 
exemplar  X  =  (Xj ,  Xj . . .x^ )  and  radbas  neuron’ s  weights  =  (x^j ,  x^2  •  •  -^Rm )  ’  the 
aetivation  out  of  the  radbas  neuron  ean  be  represented  as: 

Z^.  =  exp[-(||x  -  X^; I  * bias)^] 

Here  c  represents  the  elass  and  i  represents  the  pattern  layer  neuron.  The  implieation  is 
that  the  training  data  groups  the  hidden  layer  neurons  into  elasses.  The  bias  to  the  hidden 
layer  is  ealeulated  as  follows,  where  the  spread  is  determined  in  the  experiments. 

bias  =  0.8326/  spread 

The  summation  layer  simply  sums  the  aetivations  assoeiated  with  a  given  elass 
(Wasserman,  53).  It  ean  be  represented  as  follows: 

5,=Xexp[-(||X-X,,||*h/a5)^] 

i=\ 

The  output  layer  eompares  the  sums  of  aetivations  from  the  various  elasses.  The  elass 
having  maximum  value  receives  a  one;  the  other  classes  receive  a  zero.  Figure  8 
illustrates  the  PNN  process.  Since  the  PNN  outperforms  the  other  fusion  methods,  it 
trains  only  two-thirds  of  the  fusion  training  posterior  probabilities  and  tests  on  the 
remaining  one-third  of  the  fusion  training  posterior  probabilities. 
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Figure  8;  Probabilistic  Neural  Network  Fusion  Process 


Classifier  Fusion  Considerations 

There  are  two  different  approaches  to  combining  a  set  of  classifiers.  The  first 
approach  selects  a  classifier  from  the  set  that  is  an  “expert”  in  a  particular  local  area  of 
the  feature  space  (Kuncheva,  2001).  Feature  vectors  drawn  from  this  local  area  are 
classified  by  the  “expert”  classifier.  Sometimes  more  than  one  “local  expert”  can  be  used 
for  different  local  areas  of  the  feature  space.  The  second  approach,  classifiers  fusion, 
assumes  that  all  classifiers  are  trained  over  the  whole  feature  space,  and  are  thereby 
considered  as  competitive  rather  than  complementary  (Kuncheva,  2001). 
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Fusion  is  useful  only  if  the  eombined  elassifiers  are  mutually  eomplementary,  that 
is,  elassifiers  should  make  different  elassilication  errors  over  the  feature  space  (Roili, 
2002).  So,  one  tries  to  choose  classifiers  that  are  optimal  in  different  regions  of  the 
feature  space  (Roili,  2002).  Given  the  relationship  between  optimality  and  classification 
error,  some  have  considered  just  the  classifier  ensemble’s  error  diversity  over  the  feature 
space.  Unfortunately,  while  this  is  intuitive,  there  is  some  evidence  that  suggests  that  the 
diversity  of  classifiers  does  not  affect  the  overall  accuracies  of  the  combination  methods 
and  their  improvement  over  the  single  best  classifier  (Kuncheva,  2002).  An  experiment 
using  ten  different  measures  of  classifier  diversity  and  ten  different  combination  methods 
found  very  little  evidence  of  any  correlation  between  the  measures  of  diversity  and  the 
classifier  combination  performance  (Kuncheva,  2002). 

Performance  Measurements 

There  are  several  methods  of  measurement  used  to  determine  classifier 
performance.  ROC  curves,  for  example,  are  commonly  used  for  summarizing  the 
performance  in  automatic  target  recognition  when  classification  accuracy  alone  is  not 
sufficient  (Alsing,  2000).  The  ROC  curve  depicts  the  relationship  between  the  detection 
rate  (i.e.,  probability  of  true  positive)  and  false  alarm  rate  (i.e.,  probability  of  false 
positive)  as  a  decision  threshold  is  varied.  The  decision  threshold  usually  varies  between 
a  high,  or  conservative,  threshold  where  no  targets  are  detected  to  a  low,  or  aggressive, 
threshold  where  all  targets  are  detected  and  all  non-targets  are  labeled  as  targets.  Figure  9 
illustrates  the  probability  density  function  of  the  two  classes  as  they  relate  to  the  features. 
The  thresholds  are  varied,  for  example,  the  illustration  shows  a  threshold  of  20  percent. 
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By  determining  the  true  positive  and  false  positive  rates  for  each  threshold,  we  are  able  to 
construct  the  ROC  curve,  illustrated  on  the  right. 


-  Friendly  -  chance 

-  Threshold 


Figure  9;  Receiver  Operating  Characteristics  Curve 
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III.  Methodology 


Overview 

This  research  consists  of  four  classifier  fusion  experiments  and  one  related 
excursion.  The  main  distinction  between  the  four  experiments  is  that  each  one  uses  a 
different  data  set.  The  four  data  sets  are  a  simple  Gaussian,  a  XOR  (close)  Gaussian,  a 
XOR  (spread)  Gaussian  and  a  “domino”  Gaussian.  The  fusion  process  applied  to  each 
dataset  remained  same  across  all  the  experiments.  The  linear  discriminant  classifier,  the 
quadratic  discriminant  classifier  and  the  radial  basis  function  were  used  as  the  classifiers. 
Then  the  ROC,  fSOC  and  PNN  methods  were  used  to  fuse  all  combinations  of  two  and 
three  classifiers.  The  ensemble  results  were  then  measured  using  two  techniques;  ROC 
curves  and  expected  true  positive  classification  values.  While  the  ROC  curves  are 
described  in  the  literature  review,  the  expected  true  positive  classification  rates  were 
devised  as  part  of  this  research  and  are  described  in  the  experiments’  methodology. 

Since  the  classifiers,  fusion  methods  and  measures  of  performance  are  identically 
implemented  in  each  of  the  four  experiments,  their  methodology  is  described  only  once  in 
the  first  experiment.  Since  the  data  sets  vary  from  experiment  to  experiment,  the  data 
sets  will  be  described  in  the  methodology  for  each  experiment. 

An  assumption  was  made  based  upon  related  work  that  the  modified  radial  basis 
network  would  approximate  a  Bayes  optimal  classifier  closely  enough  that  the  output 
could  be  treated  as  a  posterior  probability.  This  excursion  compares  the  results  of  the 
modified  radial  basis  function  to  results  from  the  GRNN  given  the  same  data.  The 
GRNN  has  been  proven  to  approximate  the  Bayes  optimal  classifier  so  that  if  the  radial 


32 


basis  function  approximates  the  GRNN  then  the  assumption  is  reasonable.  For  elarity,  a 
list  of  the  experiments  and  exeursion  are  as  follows; 


Table  6;  List  of  Experimental  Designs 


Experiment  # 

Experiment  Deseription 

1 

• 

O 

Simple  Gaussian  Distribution 
(3  Good  Classifiers) 

2 

•  o 
o# 

XOR  (elose)  Gaussian  Distribution 
(No  Good  Classifiers) 

3 

•  o 
o  • 

XOR  (spread)  Gaussian  Distribution 
(2  Good  Classifiers) 

4 

o  • 

•  o 
o  • 

Complex  Gaussian  Distribution 
(1  Good  Classifier) 

5 

•  o 
om 

XOR  Gaussian  Distribtuion  RBE  vs  GRNN 

Experiment  1:  Simple  Gaussian  Distribution  ILi - 

Data  Generation 

A  feature  data  set  represents  all  the  data  eolleeted  by  various  sensors  on  the  same 
targets.  Any  given  sensor  only  generates  a  eouple  features  of  that  data  set.  The 
elassifiers  represent  the  algorithm  a  sensor  uses  to  determine  the  probability  of  a  target  in 
its  feature  data.  So  data  set  features  are  broken  out  of  the  data  set  and  sent  to  the 
appropriate  elassifier  to  generate  posterior  probabilities  of  a  target. 

Let  Fj  represent  the  set  of  features  from  sensor  1  and  let  the  aggregation  of  the 
feature  sets  be  represented  as F  =  Fj  x Fj  x  •  •  •  x F.  where  N represents  the 

number  of  feature  sets.  The  terms  “feature  set”  and  “sensor”  are  used  interehangeably 
beeause  a  partieular  feature  set  is  assoeiated  with  a  partieular  sensor.  Eaeh  set  of  features 
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may  contain  any  number  of  features.  So  let  fj'  represent  the  number  of  features  j  in 


feature  setF^ .  Table  1  illustrates  the  relationship  of  the  exemplars,  features  and  feature 


sets. 


Table  7:  Exemplars,  Features  and  Feature  Sets 


Exemplars 

F, 

1 

Fi 

Fn 

fl 

f2 

fl 

fl 

r  N  r  N 

/  1  /  2 

1 

0.199 

0.923 

0.155 

0.87 

0.029  0.211 

2 

0.626 

0.76 

0.365 

0.678 

0.401  0.492 

m 

0.327 

0.648 

0.024 

0.478 

0.878  0.695 

Here  we  have  assumed  two  features  per  sensor.  If  the  features,  f. ,  are  independent  their 
eorrelation  matrix  reduees  to  an  identity. 

The  data  generation  process  ereates  several  feature  sets.  A  generalized  correlation  matrix 
would  be  eonstrueted  as  follows  with  potential  eorrelations  within  and  between  feature 
sets. 


E  = 


^F,  ,F, 


^F^,F, 


■‘F,  ,F„ 


^F^,F„ 


Inter-eorrelation  between  features  in  different  feature  sets  eauses  the  two  feature 
sets  (sensors)  to  beeome  dependent.  Figure  10  below  illustrates  inter-eorrelation  between 
feature  sets. 
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♦ 


Exemplars 

Fi 

F, 

1 

Fn 

/' 

./■'  /> 

A  A 

1 

0.086  0.553 

0.422  0.29 

0.759  0.227 

2 

0.378  0.501 

0.527  0.558 

0.889  0.684 

m 

0.256  0.597 

0.959  0.329 

0.733  0.591 

Figure  10;  Inter-correlation  between  Feature  Sets 
In  this  research,  a  single  classifier  is  used  to  classify  single  feature  set.  The  posterior 
probabilities  between  the  classifiers  reflect  the  statistical  independence  or  dependence 
between  the  feature  sets.  Both  the  ROC  and  ISOC  fusion  methods  assume  that  the 
classifiers’  posterior  probabilities,  and  by  extension  the  feature  sets,  are  independent.  By 
making  the  features  sets  dependent,  we  violate  this  assumption  and  can  quantify  the 
sensitivity  of  the  fusion  methods  to  this  assumption. 

Some  fusion  methods  fuse  two  feature  sets  and  then  fuse  a  third  feature  set.  The 
correlation  matrix  must  maintain  the  correct  feature  independence  and  correlation  with 
regard  to  its  sub-matrices  so  that  the  fusion  method  works  correctly.  Having  defined  the 
features  and  their  correlations,  we  now  define  the  distributions  of  the  two  classes  of 
features.  Let  represent  the  feature  sets  from  class  zero  and  F'  represent  the  feature 
sets  from  class  one.  All  feature  sets  for  each  class  are  generated  at  the  same  time  using  a 
multivariate  normal  distribution.  The  only  difference  between  F°and  F'  is  that  the 
mean  of  their  distributions  are  different  /u  so  that  F°  ~  and 

F'  ~  A(//'  ,E) .  Once  the  features  are  generated  for  both  classes,  the  features  are 
concatenated  so  thatF  =  F°  u  F'  .  All  exemplars  are  then  randomized.  This  data 
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generation  process  is  repeated  to  provide  the  required  three  independent  data  sets.  If 
multiple  correlation  levels  are  required,  this  process  generates  three  data  sets  for  each 
correlation  level.  The  data  is  then  presented  to  the  first  level  classifiers. 

For  this  particular  experiment,  the  number  of  exemplars  will  vary  so  that 
e  {25,50,100}  .  The  feature  sets  have  independent  features;  however,  the  features  are 
correlated  between  feature  sets.  The  correlation  will  vary  from  0  to  1  where 
p  e  10,0.2,0.4,0.6,0.8,1.0}  .  Let  ¥■  represent  the  set  of  features,  where  i  e  {1,2,3}  .  This 


implies  F  =  Fj  x  Fj  x  F3  and  the  feature  set  space  F  c  91^ .  All  feature  sets  have  two 
independent  features.  This  leads  to  a  correlation  matrix  within  each  feature  set. 


E 


=  E 


=  E 


1  0 
0  1 


Equation  shows  a  possible  two  feature  set  correlation  matrix  using  feature  set  notation 
and  features.  This  structure  must  be  preserved  in  a  three  feature  set  correlation  matrix  for 
the  fusion  methods. 

T  0  0  yo" 

0  1  yO  0 

0  yO  1  0 

yO  0  0  1_ 

The  three-feature  set  correlation  matrix  is  presented  here  in  terms  of  features. 


T 

0 

0 

p 

0 

p 

0 

1 
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0 

p 

0 

0 

p 
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0 
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0 
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0 

0 
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p 

0 

1 

0 

p 

0 

0 

p 

0 

1 
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Figure  11;  Feature  Correlation  Matrix 


36 


It  can  be  seen  that  //  and /2^are  statistically  independent  as  are  the  other  feature  pairs. 

1  2 

Further,  it  can  be  seen  that  correlation  is  induced  between  feature  sets,  in  that,  f^  , 

and  are  correlated,  as  are  /j', /i^  and  f^.  The  feature  correlation  matrix,  shown  in 
Figure  11,  was  used  to  generate  data  for  both  classes;  however,  the  two  classes  used 
different  means.  Let  //°  =  {0,0,0,0,0,0}  be  the  mean  for  class  0  and  //'  =  {1,1,1,1,1,1}  be 
the  mean  for  class  1.  Further,  letF°be  class  0  data  distributed  as  ~  and 

F'be  class  1  data  distributed  asF'  ~  . 


1  1 

1*1  1 
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Figure  12;  Two-Class  Simple  Gaussian  Distributed  Data 
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The  two  class  data  set  are  aggregated  F  =  u  F'  and  presented  to  the  classifiers. 

There  are  forty  repetitions  of  each  sample  size,  so  that  data  will  be  generated  forty  times 
for  each  sample  size.  Figure  12  illustrates  the  distribution  of  one  feature  set  with  two 
classes  of  data. 

Classifiers 

Three  classifiers  were  used  in  this  experiment.  They  were  the  linear  discriminant, 
the  quadratic  discriminant  and  the  modified  radial  basis  function.  Each  classifier 
received  its  own  feature  set  from  the  training  data,  fusion  training  data  and  the  testing 
data.  The  data  within  the  feature  sets  were  independent.  The  feature  sets  themselves 
were  correlated.  The  classifiers  were  trained  with  the  training  data’s  feature  set.  Then 
the  classifiers  assigned  posterior  probabilities  to  each  exemplar  in  the  fusion  training  data 
and  testing  data.  These  two  sets  of  posterior  probabilities  were  then  sent  to  each  of  the 
three  fusion  methods. 

Fusion  Methods 

Some  decisions  were  made  in  applying  the  fusion  techniques.  The  ROC  method 
requires  that  the  ROC  curves  from  two  classifiers  be  fused  first  to  form  a  two-classifier 
fused  ROC  curve.  This  new  ROC  curve  is  then  fused  with  the  ROC  curve  from  the  third 
classifier  which  produces  the  final  three-classifier  fused  ROC  curve.  In  this  research,  the 
linear  and  quadratic  discriminant  classifiers  were  chosen  as  the  first  two  classifiers  and 
the  modified  radial  basis  function  was  chosen  as  the  third  classifier.  The  fusion  training 
posterior  probabilities  were  used  to  create  the  three-classifier  fused  ROC  curve.  The  final 
ROC  curve  was  then  tested  using  the  test  posterior  probabilities. 
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The  other  two  fusion  methods  are  sealable  to  aecept  greater  or  fewer  number  of 
elassifiers  so  no  deeisions  were  required.  The  ISOC  method  required  only  the  addition  of 
the  third  elassifier’s  posterior  probabilities  to  the  Combat  Identification  System  (CIS) 
states.  The  PNN  method  only  required  additional  hidden  layer  nodes.  Once  both 
methods  had  been  trained  with  the  fusion  training  posterior  probabilities,  they  were  tested 
using  the  test  posterior  probabilities. 

Optimal  Ensemble  /  Fusion  technique  combination 

All  ensemble  /  fusion  technique  combination  classification  performances  were 
measured  using  two  techniques.  ROC  curves  are  applied  as  described  in  the  literature 
review.  However,  the  second  technique,  expected  true  positive  classification  rates  was 
devised  for  this  research  to  serve  as  an  optimality  criterion. 

The  expected  true  positive  classification  rates  were  calculated  for  all  ensemble  / 
fusion  technique  combinations  at  the  0.1  false  positive  level.  Each  ensemble  /  fusion 
technique  combination  classified  data  correlated  at  six  different  levels.  The  features 
correlated  at  the  different  correlation  levels  could  be  assigned  a  distribution  so  that  each 
correlation  level  would  have  a  discrete  probability  density  value.  Two  distributions  were 
used  in  this  research,  a  uniform  and  a  linear  distribution.  Their  values  from  0  to  0.9 
correlations  are  as  follows: 

Uniform:  p{x)  =  {0.167,0.167,0.167,0.167,0.167,0.167} 

Linear:  p{x)  =  {0.05,0.09,0.14,0.19,0.24,0.28} 

When  the  probability  densities  are  applied  to  the  true  positive  value  of  each  correlation  at 
the  0.1  false  positive  level,  then  the  expected  true  positive  classification  rate  can  be 
calculated.  The  expected  true  positive  classification  rate  is  calculated  for  each  ensemble  / 
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fusion  technique  combination.  The  ensemble  /  fusion  technique  combination  yielding  the 
highest  classification  rate  according  to  this  optimality  criterion  was  then  determined  to  be 
the  best.  A  mathematical  description  might  appear  as  follows: 

ArgMax  Ep{TP\S) 

SeR 

s.t. 

FP{S)<0.\ 

where,  (TP)  =  ^{TP  \  p  =  x.)*  P{p  =  x. ) 

1=1 

and;  R  represents  the  set  of  all  possible  single  sensor  and  sensor  fusions,  where 
ISOC,  ROC  and  PNN  fusion  techniques  are  applied  to  all  subsets  with  2  or  more  sensors. 

Experiment  2:  XOR  (close)  Gaussian  Distribution 

Data  Generation 

The  data  generated  in  experiment  two  required  two  distribution  means  per  class. 
This  resulted  in  four  total  distributions  of  the  feature  data  in  an  XOR  pattern.  The  means 
used  were  as  follows: 

Class  0:  =  {0,0,0,0,0,0}  and  ///  =  {1,1,1,1,1,1} 

Class  1:  //i“  =  {0,1,0,1,0,1}  and  //2“  =  {1,0,1,0,1,0} 

The  Gaussian  distributions  possess  equal  variances  which  are  equal  to  one  for  all 
distributions.  Figure  13  represents  the  two  feature  data  set  to  the  linear  classifier. 
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Figure  13;  Two-Class  XOR  (close)  Gaussian  Distributed  Data 

Experiment  3:  XOR  (spread)  Gaussian  Distribution 

Data  Generation 

The  data  generated  for  the  third  experiment  used  same  XOR  pattern  but  increased 
the  distance  between  the  individual  Gaussian  distributions.  This  experiment  has  the  same 
number  of  distributions  as  the  second  experiment. 

Class  0;  /a,"'  =  {0,0, 0,0, 0,0}  and  //j”  =  {2.5,2.5,2.5,2.5,2.5,2.5} 

Class  1;  //i°  =  {0,2.5,0,2.5,0,2.51  and  //j”  =  {2. 5,0,2. 5,0,2. 5,0} 
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Here  again,  Gaussian  distributions  possess  equal  varianees  whieh  are  equal  to  one  for  all 
distributions.  Figure  14  represents  the  two  feature  data  set  to  the  linear  elassifier. 
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Linear  Discriminant  Feature  1 


Figure  14:  Two-Class  XOR  Gaussian  Distributed  Data 


Experiment  4:  “Domino”  Gaussian  Distribution 
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Data  Generation 


The  data  generated  for  the  fourth  experiment  added  a  distribution  to  eaeh  elass  to 
ehange  the  pattern  into  a  “domino.”  The  means  were  as  follows: 

Class  0:  =  {0,0, 0,0, 0,0} ,  //j”  =  {2.5,2.5,2.5,2.5,2.5,2.5}  and  //j"  =  {0,5,0,5,0,5} 


Class  1:  ///  ={0,2.5,0,2.5,0,2.5},  =  {2.5,0,2.5,0,2.5,0}  and  =  {2.5,5,2.5,5,2.5,5} 
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All  variances  were  set  equal  to  one  for  all  distributions.  Figure  15  represents  the  two 
feature  data  set  to  the  linear  elassifier. 


Linear  Discriminant  Feature  1 


Figure  15;  Two-Class  “Domino”  Gaussian  Distributed  Data 
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Excursion  1:  Radial  basis  function  vs  general  regression  neural  network 

Data  Generation 

Both  the  simple  Gaussian  distributed  data  and  the  XOR  Gaussian  distributed  data 
were  used  for  this  experiment.  However,  the  means  used  for  the  distributions  were 
ehanged  to  allow  for  greater  separation  of  the  data  and  better  elassifieation  results.  For 
the  simple  Gaussian  distributed  data,  let  the  elass  0  have  //°  =  {0,0,0,0,0,0}  and  the  elass 
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1  have  //'  =  {2. 5, 2. 5, 2. 5, 2. 5, 2. 5, 2. 5}  .  For  the  XOR  Gaussian  distributed  data,  let  class  0 
have  =  {0,0, 0,0, 0,0}  and  =  {2. 5,0,2. 5,0,2. 5,0}  and  let  class  1 

have///  =12.5,2.5,2.5,2.5,2.5,2.5}  and///  =  {0,2.5,0,2.5,0,2.5}  . 

Performance  Measurement 

ROC  curves  generated  from  each  classifiers  results  were  used  for  comparison. 
Classification  accuracy  is  not  the  primary  concern  in  this  comparison.  The  main  concern 
is  how  well  the  radial  basis  function  approximates  the  general  regression  neural  network. 
If  the  ROC  curves  from  each  classifier  approximate  one  another,  then  the  posterior 
probabilities  assigned  approximate  one  another  and  the  two  classifiers  are  approximating 
similar  mean-squared-errors. 
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IV,  Findings  and  Analysis 


Result  1:  Two-Class  Simple  Gaussian  Data  Experiment  - 

The  methodology  was  applied  to  the  two-class  simple  Gaussian  dataset.  In 
analyzing  the  results,  the  finding  will  step  through  the  ensembles’  ROC  curve 
performance.  Then  the  analysis  will  compare  the  ensemble  results  using  the  expected 
true  positive  classification  rates.  The  two  separate  performance  measures  were 
necessary.  It  is  significant  to  note  that  while  an  ensemble  may  do  well  based  on  the 
overall  ROC  curve,  it  may  perform  poorly  at  the  particular  threshold  of  interest. 

All  single  classifier  ensembles  had  comparable  performances.  The  data  was 
reasonably  separated  so  that  the  linear  classifier  was  able  to  distinguish  between  the  two 
classes. 
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Figure  16;  Linear  Classifier  ROC  Curve  (SG) 
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Figure  17:  Quadratic  Classifier  ROC  Curve  (SG) 

Radial  basis  and  quadratic  classifiers  were  also  able  to  distinguish  between  the  classes. 
The  quadratic  classifier’s  performance  is  shown  in  Figure  17.  The  radial  basis  classifier’s 
performance  was  slightly  worse  than  the  quadratic  classifier. 

The  PNN  two-classifier  ensemble  reflects  the  posterior  probabilities  of  the  fused 
classifiers.  The  linear-quadratic  ensemble,  having  the  two  best  classifiers,  offers  the  best 
classification  performance.  The  linear-radial  and  quadratic -radial  ensembles  have 
comparable,  slightly  degraded  classifications  performances  since  they  include  the  worst 
classifier,  the  radial  basis  function  neural  network.  The  three-classifier  ensemble 
performs  slightly  better  than  the  linear-quadratic  ensemble  at  the  lower  correlation  levels 
but  poorer  at  the  higher  levels  of  correlation.  This  can  be  attributed  to  the  fact  that  the 
PNN  uses  posterior  probabilities  from  all  three  classifiers  and  ignores  no  information. 
Lower  correlation  levels  offer  the  PNN  more  data  to  train  on  so  that  the  three-classifier 
ensemble  offers  more  data  than  just  two-classifier  ensemble. 
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Figure  18:  PNN  Fusion  of  Linear  &  Quadratic  Classifiers  (SG) 

At  the  higher  correlation  levels,  the  three-classifier  ensemble  performs  worse  than 
the  two-classifier  ensemble.  This  appears  to  be  due  to  the  idea  that  while  three-classifier 
ensemble  offers  more  data  relative  to  the  two-classifier  ensemble,  it  also  includes  more 
errant  classifications.  In  all  cases,  the  PNN  exhibits  sensitivity  to  correlation  levels. 
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Figure  19:  PNN  Fusion  of  Linear,  Quadratic  &  Radial  Basis  Classifiers  (SG) 
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The  ISOC  two-classifier  ensembles  exhibited  “robustness”  to  correlation.  Two- 


classifier  ensembles  which  included  the  radial  basis  classifier  performed  slightly  worse. 


Figure  20;  ISOC  Fusion  of  Linear  &  Quadratic  Classifiers  (SG) 

However,  a  ROC  curve  of  the  three-classifier  ensemble  provides  remarkable  separation 

between  the  correlation  levels  despite  the  fact  that  neither  the  single  classifiers  nor  the 

two-classifier  ensembles  exhibited  any  separation. 
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Figure  21;  ISOC  Fusion  of  Linear,  Quadratic  &  Radial  Basis  Classifiers  (SG) 
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Examination  of  the  optimal  ISOC  rule  for  the  ensemble  revealed  that  the  rule  redueed  to 
a  majority  voting  method.  For  Table  8,  shown  below,  the  ones  represent  a  “hostile” 
elassifieation  and  zeros  represent  a  “friendly”  elassifieation. 


Table  8;  Optimal  ISOC  Rule  3-Classifier  Fusion 


Linear 

Quadratic 

Radial  Basis 

Optimal  ISOC 
Rule 

1 

1 

1 

1 

1 

1 

0 

1 

1 

0 

1 

1 

0 

1 

1 

1 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

Comparing  the  optimal  ISOC  rule  with  the  ROC  eurves,  it  appears  that  the  three 
classifiers  are  making  slightly  different  errors.  The  majority  voting  method  style-rule 
eliminates  the  errors  a  single  classifier  is  making  different  from  the  other  two  classifiers. 
So,  the  result  is  that  the  fusion  rule  keeps  only  the  errors  common  to  all  three  classifiers 
and  this  increases  the  ensemble’s  performance.  The  separation  of  the  correlation  levels 
can  be  explained  by  realizing  that  correlation  represents  the  amount  of  new  information 
available  to  the  ensemble.  At  the  zero  correlation  level,  the  ensemble  has  the  most  new 
information  available  and  yields  the  best  classification  performance.  As  the  correlation 
increases  the  ensemble  has  less  new  information  to  classify  new  exemplars  with  and  the 
ensemble’s  performance  deteriorates. 

The  ROC  two-classifier  ensembles  performed  as  expected.  The  linear-quadratic 
ensemble,  having  the  two  best  classifiers,  offered  the  best  classification  performance.  The 
linear-radial  and  quadratic-radial  ensembles  have  comparable,  slightly  degraded 
classifications  performances  since  they  include  the  worst  classifier,  the  radial  basis. 
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PROBABILIPf  OF  FALSE  POSITIVE 


Figure  22:  ROC  Fusion  of  Linear  &  Quadratie  Classifiers  (SG) 


Figure  23:  ROC  Fusion  of  Linear,  Quadratie  &  Radial  Basis  Classifiers  (SG) 


The  three-elassifier  ensemble’s  performanee  fell  between  the  best  and  worst  two- 
elassifier  ensembles  beeause  it  included  all  three  classifiers. 
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The  methodology  to  caleulate  the  expected  true  positive  classification  rates  was 


then  applied  to  each  ensemble’s  results.  The  rates  are  shown  in  Figure  24. 
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Figure  24:  Simple  Gaussian  Expected  True  Positive  Classification  Rates 


Of  the  three  classifiers,  the  linear  and  quadratic  outperformed  the  radial  basis 


classifier.  This  fact  becomes  significant  as  the  fusion  methods  use  the  posterior 


probabilities  from  these  classifiers  to  form  ensembles.  ISOC  two-classifier  ensembles 


have  an  expected  true  positive  classification  rate  that  follows  the  performance  of  the 


classifiers  being  fused.  However,  the  ISOC  three-classifier  ensemble’s  performance  is 


better  than  any  single  classifier  by  approximately  ten  percent.  This  separation  between 


the  single  best  classifier  and  the  best  ensemble  was  the  greatest  separation  found  across 
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all  experiments.  It  is  interesting  to  note  that  fusion,  in  the  eases  studying,  only  provided  a 
marginal  inerease  in  elassifieation  aeeuraey.  The  three-elassifier  ISOC  ensemble 
performanee  refleets  the  analysis  deseribed  for  the  ROC  eurves.  Briefly,  the  elassifiers 
are  making  different  errors  and  the  optimal  ISOC  fusion  is  able  to  identify  some  of  the 
individual  elassifiers  error.  The  ROC  ensembles’  performanee  follows  the  performanees 
of  their  member  elassifiers,  but  never  worse  than  the  worse  elassifier.  In  the  PNN  two- 
elassifier  ensembles,  the  ensembles  follow  the  performanees  of  their  member  elassifiers. 
For  reasons  diseussed  in  the  PNN  three-elassifier  ensemble,  the  three-elassifier  ensemble 
performs  poorest  of  all  PNN  ensembles.  The  PNN  ensemble  performanees  are  always 
outperform  most  of  the  other  ensembles.  In  this  ease,  even  the  poorest  PNN  ensemble 
performs  approximately  as  well  as  the  best  single  elassifier.  Finally,  the  prior  linear 
distribution  of  correlation  levels  shows  that  the  distribution  only  affects  classification 
when  the  ensemble  is  sensitive  to  correlation. 

Result  2:  Two-Class  XOR  (close)  Gaussian  Data  Experiment 

All  single  classifier  ensembles  had  comparable  performances.  The  data  was 
poorly  separated  so  that  the  classifiers  had  some  difficultly  distinguishing  between  the 
two  classes.  The  poor  classification  across  all  the  classifiers  caused  all  ensembles  to 
perform  poorly  as  well.  Figure  25  shows  the  linear  classifier  ROC  curve.  It  is 
representative  of  all  the  classifier  and  ROC  and  PNN  ensembles.  Even  the  PNN 
ensembles,  which  are  usually  sensitive  to  correlation  levels,  showed  no  distinction 
between  0  correlation  and  0.9  correlation  levels.  The  ISOC  ensembles  provided  the  only 
notable  variation  among  the  ensembles. 
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Figure  25:  Linear  Classifier  ROC  Curve 


The  ISOC  ensembles  can  not  construct  a  complete  ROC  curve  under  the  given 
conditions.  All  three  two-classifier  ISOC  ensembles  and  the  three-classifier  ISOC 
ensemble  failed  to  construct  ROC  curves  over  the  full  0  to  1  false  positive  range. 


Figure  26:  ISOC  -  Linear/Quadratic  Ensemble  ROC  Curve 


53 


The  reason  for  this  failure  ean  be  seen  in  the  optimal  ISOC  rule  and  its  applieation  to  the 
classifiers  posterior  probabilities. 


Table  9:  Optimal  ISOC  Rule 


Linear 

Quadratic 

Optimal  ISOC 
Rule 

1 

1 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

If  a  poor  classifier  has  a  condition  where  its  P(fp)  >  P(tp)  at  the  threshold  used  to  develop 
the  ISOC  rule,  then  the  “all  hostile”,  or  (1,1),  sensor  output  state  will  not  be  the  first 
introduced  into  the  optimal  ISOC  rule.  Furthermore,  if  the  lowest  cost  ISOC  rule  only 
contains  one  sensor  output  state,  then  the  “all  hostile”  sensor  output  state  will  be 
excluded  from  the  optimal  ISOC  rule.  When  constructing  the  ROC  curve,  one  expects  to 
find  an  increasing  number  of  hostile  indications  with  lower  threshold  values.  And  hence, 
the  ROC  curve  will  reach  100  percent  probability  of  finding  all  true  positive  and  false 
positive  exemplars  when  the  threshold  reaches  zero.  However,  given  the  optimal  ISOC 
rule  in  Table  9,  when  the  threshold  approaches  zero  and  the  “all  hostile”  sensor  output 
state  becomes  prevalent,  the  optimal  ISOC  rule  labels  the  exemplars  as  “friendly”  and 
returns  the  ROC  curve  to  the  origin.  Appendix  A  provides  more  details. 

The  expected  true  positive  classification  rates  were  determined  for  each  ensemble 
and  provide  no  new  insight.  All  ensembles  performed  poorly.  The  single  best  classifier’s 
performance  and  the  best  ensemble  were  approximately  the  same.  Among  the  ensembles 
there  appears  to  be  very  slight  improvements  and  degradation  of  performance  depending 
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upon  the  relative  performanee  of  the  single  elassifier  members.  Figure  27  shows  the 
expected  true  positive  classification  rates. 


Comparison  of  E(TP)  Given  Uniform  &  Linear  Distributions 


Figure  27;  XOR  (close)  Gaussian  Expected  True  Positive  Classification  Rates 


Result  3:  Two-Class  XOR  (spread)  Gaussian  Data  Experiment 


o 


o 


This  experiment  represents  the  case  where  one  classifier  fails  while  two  other 
classifiers  perform  well.  The  linear  classifier  performed  about  as  well  as  chance,  while 
the  quadratic  and  modified  radial  basis  classifiers  performed  well.  The  two  good 
classifiers  have  nearly  identical  performances.  Figure  28  shown  below  illustrates  the 
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linear  performance  and  Figure  29  illustrates  the  quadratic  and  modified  radial  basis 
performances. 


Figure  28:  Linear  Classifier  ROC  Curve 


P(FP) 


Figure  29:  Quadratic  Classifier  ROC  Curve 


The  PNN’s  two-classifier  ensembles  exhibited  two  notable  characteristics.  If 
either  of  the  good  classifiers  (i.e.,  quadratic  or  radial  basis)  were  fused  with  the  linear 
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classifier,  the  ensemble’s  ROC  eurve  results  were  very  similar  to  the  better  elassifier’s 
results.  In  essenee,  the  linear  elassifier  was  ignored  and  sinee  the  remaining  had  a  very 
tight  eorrelation  level  range  the  ensemble  produeed  very  tight  eorrelation  level  range. 


PROBABILITY  OF  FALSE  POSITIVE 


Figure  30:  PNN  Fusion  of  Linear  &  Radial  Basis  Classifiers  (XOR) 

The  quadratie-radial  basis  ensemble  showed  the  best  performanee.  Flere  again,  the  PNN 
demonstrates  better  elassifieation  at  lower  eorrelations  due  to  the  inerease  in  new  data 
from  more  independent  features.  Also,  while  both  elassifiers  had  very  tight  eorrelation 
level  ranges,  they  made  different  errors  so  that  the  ensemble  elassifieation  performanee 
was  better  than  either  one  elassifier’s  performanee. 

Sinee  the  linear  elassifier  was  extremely  poor  the  two-elassifier  PNN  ensemble 
and  the  three-elassifier  PNN  ensembles  are  nearly  identieal.  In  faet,  the  data  has  some 
variation  so  that  the  two  ROC  eurves  eould  be  eonsidered  identieal. 
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Figure  3 1 ;  PNN  Fusion  of  Quadratic  &  Radial  Basis  Classifiers  (XOR) 
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Figure  32:  PNN  Fusion  of  Linear,  Quadratic  &  Radial  Basis  Classifiers  (XOR) 


The  two-classifier  ROC  ensembles  were  similar  to  the  two-classifier  PNN 
ensembles,  in  that,  the  linear  classifier  always  has  an  equal  probability  of  true  positive 
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and  false  positive  and  its  thresholds  are  largely  of  no  affect.  The  quadratic -radial  basis 
ensemble  shows  some  sensitivity  to  correlation  levels. 


PROBABILITY  OF  FALSE  POSITIVE 


Figure  33:  ROC  Fusion  of  Linear  &  Radial  Basis  Classifiers  (XOR) 


Figure  34:  ROC  Fusion  of  Quadratic  &  Radial  Basis  Classifiers  (XOR) 


The  three-classifier  ROC  ensemble  is  identical  to  the  quadratic-radial  basis  ensemble. 
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In  the  two-classifier  ISOC  ensembles  method  ignored  the  poor  linear  classifier 
and  classified  the  exemplars  nearly  exclusively  the  same  as  the  better  classifiers.  There  is 
a  very  slight  difference  between  the  radial  basis  and  quadratic  classifiers  so  that  when 
ISOC  fuses  the  two  classifiers  there  is  a  slight  classification  improvement.  The  three- 
classifier  ISOC  ensemble  mostly  ignores  the  linear  classifier  and  performs  nearly 
identically  to  the  quadratic/radial  basis  ISOC  ensemble. 
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Figure  35:  ISOC  Fusion  of  Linear,  Quadratic  &  Radial  Basis  Classifiers  (XOR) 

The  expected  true  positive  classification  rates  highlights  the  ensembles’ 
classification  differences  more  dramatically.  It  illustrates  the  linear  classifiers  poor 
performance.  It  also  shows  that  the  linear  classifier  detracted  from  the  classification 
performance  of  an  ensemble  in  which  it  was  included.  Another  trend  which  is  readily 
evident  in  Figure  36  is  that  the  ensemble  methods  are  fairly  resilient  to  the  poor 
classifiers  affects.  For  example,  the  expected  linear  true  positive  classification  rate  is 
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approximately  10  percent.  After  fusion  with  any  other  classifier,  the  worst  performance 
was  approximately  57%. 

Comparison  of  E(TP)  Given  Uniform  &  Linear  Distributions 


Figure  36:  XOR  (spread)  Gaussian  Expected  True  Positive  Classification  Rates 


The  last  significant  remark  regarding  the  Figure  36  concerns  the  PNN  ensemble 
performance.  In  the  linear/radial  basis  PNN  ensemble,  its  performance  is  reduced  by 
fusion  with  a  poor  classifier.  However,  in  the  case  three-classifier  PNN  ensemble,  the 
performance  is  actually  increased  above  the  quadratic/radial  basis  PNN  ensemble.  The 
question  may  be  how  the  linear  classifier  can  improve  classification  in  one  case  and 
reduce  classification  in  another.  In  the  linear/radial  basis  PNN  ensemble,  PNN  must  use 
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all  the  training  data  given  to  it.  It  diseards  nothing  so  it  incorporates  some  of  the  linear 
classifiers  errors  when  it  fuses.  PNN,  by  design,  can  not  ignore  the  linear  classifier. 
However,  in  the  three-classifier  PNN  fusion  the  quadratic  and  radial  basis  function 
correct  classifications  outweigh  the  linear  classifier’s  errors.  In  addition,  the  linear 
classifier  and  one  of  the  other  good  classifiers  must  outweigh  some  of  the  errors  of  the 
third  to  improve  the  three-classifier  ensemble’s  performance. 


Result  4:  Two-Class  “Domino”  Gaussian  Data  Experiment 


om 
•  o 
o# 


This  experiment  represents  the  case  where  two  classifiers  perform  poorly  while 


only  one  classifier  performs  well.  The  linear  and  quadratic  classifiers  perform  nearly 


identically  poorly  but  the  modified  radial  basis  classifier  performs  well.  Figure  37 


illustrates  the  linear  and  quadratic  classifiers’  performance  while  Figure  38  illustrates  the 


modified  radial  basis  classifier’s  performance. 


Figure  37;  Linear  Classifier  ROC  Curve 
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Figure  38;  Radial  Basis  Classifier  ROC  Curve 

The  three  fusion  methods  ignore  the  poor  elassifiers  in  the  fusion  ensembles.  All 
two-classifier  ensemble  performances  are  nearly  identical.  Since  the  linear  and  quadratic 
classifiers  perform  equally  poor  and  apparently  make  the  same  type  of  errors,  all  the 
linear/quadratic  fusion  ensembles  are  also  nearly  identical.  The  only  notable  finding  is 
that  the  three-classifier  PNN  ensemble  classification  performance  improves  as  the 
correlation  increases.  Figure  X  shows  the  three-classifier  PNN  ensemble  performance. 
This  can  be  attributed  to  the  geometry  of  the  data  as  the  correlation  levels  increase.  At 
the  0  correlation  level,  the  two-class  data  appears  to  be  circular  in  two-space  with 
particular  means  and  equal  concentric  distributions  about  the  means.  However,  as  the 
correlation  levels  increase  the  distribution  of  the  data  becomes  oblong  about  the  means. 
This  in  turn  leads  to  a  better  separation  of  the  classes  and  hence  better  classification 
performance.  The  “domino”  case  exhibited  this  phenomenon  whereas  the  other  data  sets 
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did  not  because  the  means  of  the  classes  must  be  off-center  from  one  another  for  the 
elongating  data  to  separate.  Otherwise,  the  data  elongates  into  the  other  class. 


Figure  39;  Three-Classifier  PNN  Ensemble  ROC  Curve 

The  expected  true  positive  classification  rates  showed  a  variety  of  results  across 
the  ensembles.  As  is  expected,  linear  /  quadratic  fusion  ensembles  performed  poorly. 

The  ISOC  ensembles  which  include  the  modified  radial  basis  classifier,  ignore  the  linear 
and  quadratic  classifiers.  However,  the  three-classifier  ISOC  ensemble  does  not  ignore 
the  linear  and  quadratic  classifiers.  This  is  due  to  the  fact  that  the  modified  radial  basis 
performs  poorly  at  the  mid-points  between  the  data.  Given  that  the  linear  and  quadratic 
classifiers  perform  well  at  these  points,  they  are  included  in  the  optimal  ISOC  rule. 
Unfortunately,  the  two  classifiers  perform  poorly  overall  and  the  overall  classification 
decreases.  The  ROC  ensembles  perform  as  expected.  The  fusion  of  poor  classifiers  with 
a  good  classifier  causes  the  classification  accuracy  to  decrease.  When  two  poor 
classifiers  are  used  the  classification  drops  even  more. 
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Comparison  of  E(TP)  Given  Uniform  &  Linear  Distributions 
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Figure  40;  “Domino”  Gaussian  Expected  True  Positive  Classification  Rates 

Finally,  the  PNN  ensembles  appear  to  overcome  the  poor  classifiers  performance. 

This  can  be  attributed  to  the  classifiers  making  different  types  of  errors  in  the  mid-point 
region  of  the  “domino.”  The  PNN  ensemble  can  adjust  for  these  errors  using  the  training 
data  while  the  other  methods  can  not. 

•  o 

o# 


Result  5:  Radial  basis  function  vs  general  regression  neural  network 

An  initial  conjecture  was  made  that  the  radial  basis  function  approximates  an 
optimal  Bayes  classifier.  A  general  regression  neural  network  does,  in  fact,  approximate 
an  optimal  Bayes  classifier.  This  experiment  used  scatter  plots  of  both  classifiers’ 
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labeled  exemplars  and  their  respeetive  ROC  eurves  to  eompare  the  elassifieation 
performanee  and  determine  if  the  initial  eonjeeture  was  reasonable. 

The  seatter  plots  show  the  GRNN  and  modified  RBF  elassifiers  elassifieation  of 
the  same  exemplars.  In  this  ease,  the  GRNN  classifier  has  very  distinct  line  of 
classification  between  the  two  classes.  The  RBF,  on  the  other  hand,  does  not  have  that 
distinct  line  of  classification.  There  is  some  “bleeding”  into  each  of  the  other  classes’ 
space.  The  most  noticeable  difference  between  the  two  classifiers  is  that  the  RBF  more 
often  misclassifies  outliers.  The  differences  in  classification  could  be  attributed  to  an 
insufficient  amount  of  data.  The  GRNN  ROC  curve  very  slightly  outperforms  the 
modified  RBF  ROC  curve  but  enough  to  reject  the  concept  that  the  modified  RBF 
classifier’s  output  could  be  used  as  posterior  probabilities. 
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Figure  41;  Modified  Radial  Basis  Classifier  ROC  Curve 
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PROBABILITY  OF  TRUE  POSITIVE 
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Figure  42:  General  Regression  Neural  Network  Classifier  ROC  Curve 


67 


V. 


Conclusions 


Introduction 

The  objective  of  the  research  was  to  determine  the  optimal  sensor  ensemble  and 
fusion  technique  combination  across  differing  prior  correlation  distributions.  To  this  end, 
four  different  feature  geometries  were  generated  and  classified  by  all  possible  ensembles. 
The  possible  ensembles  consisted  of  three  single  classifier  ensembles  and  all  possible 
combinations  of  the  single  classifier  ensembles  using  three  different  fusion  techniques. 
Finally,  the  ensemble  performances  were  measured  using  ROC  curves  and  expected  true 
positive  classification  rates. 

Conclusions 

Several  conclusions  can  be  drawn  from  the  findings  and  analysis.  When  only 
good  classifiers  were  used,  the  ISOC  ensemble  was  able  to  reduce  the  optimal  ISOC  rule 
to  a  majority  vote  method  which  successfully  eliminated  individual  classification  errors. 
Only  errors  common  to  all  three  classifiers  affected  the  ISOC  ensemble’s  performance. 

In  this  case,  ISOC  fusion  method  outperformed  both  the  ROC  and  PNN  fusion  methods. 
When  good  and  bad  classifiers  are  used,  then  PNN  ensembles  consistently  outperformed 
the  other  fusion  methods.  Finally,  when  only  poor  classifiers  are  used,  none  of  the  three 
fusion  techniques  could  significantly  improve  the  classification  performance.  In  addition, 
the  optimal  ISOC  rule  declared  a  target  as  “friendly”  when  the  underlying  classifiers 
indicated  the  was  target  “hostile.”  This  caused  the  ROC  curve  construction  to  fail.  The 
last  experiment,  affirmed  that  it  was  reasonable  to  use  the  modified  radial  basis  function 
neural  network  outputs  as  posterior  probabilities. 
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Overall,  the  fusion  teehniques  exhibited  eonsistent  eharacteristies.  ROC  fusion 
always  performed  no  worse  than  the  worst  classifier.  ISOC  and  ROC  techniques  were 
generally  very  “robust”  to  correlation.  Lastly,  fusion  does  not  yield  large  increases  in 
classification  accuracy  above  the  single  best  classifier.  It  is  useful,  in  that,  it  mitigates  the 
affects  of  poor  classifiers.  When  a  good  and  poor  classifier  are  fused  by  any  method  the 
resulting  classification  accuracy  is  generally  close  to  the  better  classifier. 
Recommendations  for  Future  Research 

There  are  several  possible  areas  for  future  research.  This  research  used  feature 
data  generated  from  code  developed  in  Matlab.  However,  the  next  logical  step  in  the 
investigation  of  fusion  ensembles  would  use  real  sensor  feature  data.  This  would  allow 
researchers  to  find  any  similarities  and  differences  between  the  artificial  environment  and 
the  “real”  world. 

Another  area  of  future  research  would  use  sensor  classifiers  appropriate  for  the 
actual  sensor  data  obtained.  These  classifiers  may  optimize  different  regions  of  the 
feature  space  or  be  sensitive  to  correlated  features  or  have  any  number  of  characteristics 
that  will  affect  the  classification  accuracy.  In  addition,  fusion  techniques  perform  best 
with  classifiers  possessing  complementary  error  types.  In  other  words,  classifiers  that 
make  different  types  of  errors. 

Lastly,  the  number  of  classes  used  for  this  experimentation  should  be  expanded  to 
three  and/or  four  classes.  In  the  operational  world,  there  are  non-combatants  such  as 
civilians  and  international  relief  workers  that  do  not  participate  in  combat.  A  fourth  class 
would  be  hostile,  friendly  or  non-combatants  that  can  not  be  identified  with  any  degree  of 
confidence.  This  class  might  be  called  an  “unknown”  class. 
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Appendix  A:  ISOC  Likelihood  Ratios  and  Cost  Rules 


The  ISOC  method  uses  an  optimal  rule  to  classify  new  exemplars.  The  optimal 
rule  receives  the  classifiers’  output  states  and  if  the  combination  of  the  classifiers’  output 
states  matches  an  output  state  in  the  optimal  rule,  then  the  target  is  declared  hostile. 
Otherwise,  the  target  is  declared  friendly.  However,  when  the  ISOC  method  fuses  a  poor 
classifier  and  a  good  classifier,  it  may  exclude  the  output  state  of  both  classifiers 
indicating  hostile  from  the  optimal  rule.  The  effect  is  that  the  target  is  declared  friendly 
when  both  classifiers  indicate  hostile.  This  circumstance  requires  two  conditions  in  the 
ISOC  process.  First,  the  likelihood  ratio  of  some  output  state  must  be  higher  than  the 
likelihood  ratio  of  an  all  hostile  output  state.  Secondly,  the  cost  of  the  ordered  set  of 
rules  must  be  less  with  the  rule  associated  with  the  all  hostile  output  state  excluded. 

This  research  considers  only  a  two  class,  two  classifier  problem.  The  possible 
number  of  Combat  Identification  States  (CIS)  is  defined  by  the  number  of  classifiers  and 
output  states.  In  this  case,  there  are  four  CIS  states.  The  probability  of  each  CIS  state 
given  a  hostile  and  friendly  are  found. 

Table  10;  ISOC  Output  States  and  Conditional  Probabilities 


Output  State 

State 

(Si)  Classifieri  Classifier2 


51  1  1 

52  1  0 

53  0  1 

54  0  0 


P(Si  I  H)  P(Si  I  F) 
P(tp)i*P(tp)2  P(fp)i*P(fp)2 
P(tp)i*P(fn)2  P(fp)i*P(tn)2 
P(fn)i*P(tp)2  P(tn)i*P(fp)2 
P(fn)i*P(fn)2  P(tn)i*P(tn)2 
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Where  1  represents  a  hostile  target  and  0  represents  a  friendly  target.  The  true  positive, 
false  positive,  true  negative  and  false  negative  indieations  are  taken  from  this  notation.  A 
eonfusion  matrix  would  look  as  follows: 

Table  1 1 :  Confusion  Matrix 


Q)  ttA  M 


Truth 

1  0 


tp 

fp 

fn 

tn 

The  likelihood  ratios  of  the  four  CIS  states  are  the  ratios  of  the  P(Si  |  H)  /  P(Si  |  F). 

LR,  =P{tp\*P{tp)JP{fp\*P{fp), 

LR^  =  P{tp),  ^P{fn)^  I  P{fp)i  *Pitn)2 
LR^  =  P{fn)y  *  P{tp)JP{tn)y  *  P{fp)2 
LRa  =  Pifn)i  *  Pifn)^  /  Pitn)i  *  P{tn)^ 

In  this  ease,  first  likelihood  ratio  represents  the  CIS  state  where  both  elassifiers  indieate 
that  the  target  is  hostile.  What  is  of  interest  is  when  one  of  the  other  likelihood  ratios  is 
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greater  than  the  first. 


LR^  >  LR^ 

P{fn\*P{tp),^  P{tp\*P{tp), 
P{tn\^P{fp\  P{fp\^P{fp\ 


P{\-tp\^P{tp\  ^  P{tp\^P{tp\ 
P{\  -fp\^  P{fp\  P{fp\  *  P{fp\ 
P{tp)^-P{tp)^*P{tp)^  ^  P{tp)x*  P{tp)2 

Pifp)2  -PifP\  ^PifP)2  P(fP\  ^PifP)2 


Pitp)2-PiP)i*PiP)2  > 


P{tp\  ^P{tp\^{P{fp\-P{fp\^P{fp\) 

Pifp\^Pifp)2 


P{tp)2 


-P(tp)^*P(tp)2  > 


P{tp\  *P{tp\^P{fp\  -P{fp\  *P{fp\  *P{tp\  *P{tp\ 

Pifp\*P(fp)2 


P(tp)2-PiP)i*P(P)2  > 

P{tp)2  -P(tp)i  *P(tp)2  ^ 
P{tp)2 


P{tp\^P{tp\  -P{fp\^P{tp\  ^P{tp\ 

P(fp\ 

P{tp\  ^P{tp\  -P{fp\  ^P{tp\  ^P{tp\ 
Pitp),*Pifp\ 


\-P{tp)^  > 


Pitp)l-Pifp)i*Pitp)l 

P(fp\ 


Pifp\-P(tp\*Pifp)i  >  P{tp\  -P{fp\ 


PifP\  >  Pitp\ 


A  proof  shows  that  the  third  likelihood  ratio  is  greater  than  the  first  when  the  probability 
of  a  false  positive  is  greater  than  the  probability  of  a  true  positive  for  first  elassifier.  The 
same  eould  be  said  of  the  seeond  likelihood  ratio  and  the  seeond  elassifier.  Onee  it  is 
established  that  a  likelihood  ratio  other  than  the  all  hostile  likelihood  ratio  is  greater,  then 
the  ordered  likelihood  ratio  will  have  that  greater  likelihood  ratio  as  the  first  to  enter  the 
optimal  rule. 

The  seeond  eondition  requires  that  the  eost  of  the  optimal  rule  without  the  all 
hostile  output  state  be  greater  than  with  the  all  hostile  output  state.  Given  the  following 
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order  in  which  output  states  enter  the  optimal  rule  a  comparison  can  be  made  between  the 
first  and  second  possible  optimal  rules’  costs. 

Table  12:  Combat  Identification  States 

Combat  Identification  States  (CIS) 


RulOi 

(1  ,  1) 

(1,0) 

(0,1) 

(0,0) 

Ruloi 

0 

0 

1 

0 

Rule2 

1 

0 

1 

0 

Rules 

1 

0 

1 

1 

Rule4 

1 

1 

1 

1 

The  cost  associated  with  the  first  rule,  which  is  associated  with  CIS  output  state  and  the 
third  likelihood  ratio,  needs  to  be  greater  than  the  cost  of  the  second  rule.  The  second 
rule  includes  the  all  hostile  CIS  output  state.  When  the  first  cost  is  less  than  the  second 
then  the  ISOC  method  will  exclude  the  all  hostile  CIS  state  from  the  optimal  rule. 

Cost^  <  Cost 2 

*  P{F)  *  P{fp\  +  *  P{H)  *  P{fn\  <  *  P{F)  *  P{fp)2  +  C^„  *  P{H)  *  P{fn)2 

P{fp\+P{fn\<P{fp)2+P{fn)2 
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